A Hybrid Approach for Automatic Classification of Chinese Unknown Verbs

نویسندگان

  • Hui Xin Zeng
  • Zhaolin Liu
  • Zhao-Ming Gao
  • Kejian Chen
چکیده

本論文合併兩種方法預測未知動詞的詞類。第一種方法為規則法,即從訓練 語料中歸納出未知動詞組成的構詞規律,分成兩個主要的判斷方式:一、依 照未知動詞的組成的關鍵字決定其分類。二、依照未知動詞的構成組合決定 其分類。 關鍵字法首先將動詞依長度分為四組。第一組為二字詞、三字詞、四字詞、 五字以上的詞彙。在對實際語料的觀察下,發現不同詞長的動詞結構相異, 因此將語料依詞長分組。例如:三字詞可訓練出「好」、「出」兩條規則決 定動詞的詞類,其他長度的未知動詞並沒有這兩條規則,另外「化」規則不 適用於二字動詞。 規則法的第二部分為依照構成組合決定其分類。在觀察未知動詞時,發現有 部分未知動詞的組合很具有規律,我們就將訓練語料中未知動詞的組合做個 歸納,得到九種組合。在十次實驗中,規則法可以處理的未知動詞平均約為 23.19%,猜測正確的比例為 91.67%。 二、相似法為利用與未知動詞相似的例子來預測未知動詞的詞類。相似法主 要利用知網與中央研究院中文句結構樹資料庫 1.0 作為語意與詞類相似度測 量的工具。藉由計算未知動詞與已知動詞的相似度來預測未知動詞的詞類, 未知動詞的詞類為與其相似度最高的相似例子的詞類。 * 中央研究院資訊所,曾慧馨 E-mail: [email protected] 陳克健 E-mail: [email protected] + 政治大學資訊系 E-mail: chaolin@ nccu.edu.tw ** 台灣大學外文系 E-mail: [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

中文動詞自動分類研究 (Automatic Classification of Chinese Unknown Verbs) [In Chinese]

We present a new method for automatic classification of Chinese unknown verbs. The method employs the instance-based categorization using the k-nearest neighbor method for the classification. The accuracy of the classifier is about 70.92%.

متن کامل

MULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM

Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...

متن کامل

Semantic Classification of Chinese Unknown Words

This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and C...

متن کامل

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection

A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...

متن کامل

Transitivity in Light Verb Variations in Mandarin Chinese - A Comparable Corpus-based Statistical Approach

This paper adopts a comparable corpus-based approach to light verb variations in two varieties of Mandarin Chinese and proposes a transitivity (Hopper and Thompson 1980) based theoretical account. Light verbs are highly grammaticalized and lack strong collocation restrictions; hence it has been a challenge to empirical accounts. It is even more challenging to consider their variations between d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCLCLP

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2002